37 research outputs found
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Altmetrics Study On Research Outputs In Fields of Social Sciences In Top Iranian Universities
Purpose: The purpose of the present work was Altmetrics study of research outputs in the field of social and behavioral sciences in major Iranian universities during 2010-2020.
Methodology: The research outputs of the thematic domains of social and behavior of sciences major Iranian universities indexed in the Scopus database were reviewed. This applied research was conducted with a Altmetrics approach. Scopus and Altmetrics Explorer databases were used to collect data. Data analysis was performed using descriptive and inferential statistical tests in Excel software.
Findings: Current study revealed Shahid Beheshti, Tehran, Tarbiat Modares, Tabriz, and Shiraz universities, in the field of social sciences, had the most ranks in items of Mentions and Bookmarks. In addition, in all the universities surveyed, the most mentions were on Twitter and the most bookmarks were on Mendeley.
Conclusion: Overall, the findings showed that most of the surveyed universities were not in an acceptable position in terms of social media presence and Altmetrics score, indicating the lack of familiarity of the corresponding researchers with the benefits of social media and their low participation in sharing their research outputs on social media
Detectability thresholds and optimal algorithms for community structure in dynamic networks
We study the fundamental limits on learning latent community structure in
dynamic networks. Specifically, we study dynamic stochastic block models where
nodes change their community membership over time, but where edges are
generated independently at each time step. In this setting (which is a special
case of several existing models), we are able to derive the detectability
threshold exactly, as a function of the rate of change and the strength of the
communities. Below this threshold, we claim that no algorithm can identify the
communities better than chance. We then give two algorithms that are optimal in
the sense that they succeed all the way down to this limit. The first uses
belief propagation (BP), which gives asymptotically optimal accuracy, and the
second is a fast spectral clustering algorithm, based on linearizing the BP
equations. We verify our analytic and algorithmic results via numerical
simulation, and close with a brief discussion of extensions and open questions.Comment: 9 pages, 3 figure
Recommended from our members
Limits of Model Selection, Link Prediction, and Community Detection
Relational data has become increasingly ubiquitous nowadays. Networks are very rich tools in graph theory, which represent real world interactions through a simple abstract graph, including nodes and edges. Network analysis and modeling has gained extremely wide attentions from the researchers in various disciplines, such as computer science, social science, biology, economics, electrical engineering, and physics. Network analysis is the study of the network topology to answer a variety of application-based questions regarding the original real world problem. For example in social network analysis the questions are related to how people interact with each other in online social networks, or in collaboration networks, how diseases propagate or how information flows through a network, or how to control a disease or food outbreak. In electric networks like power grids or in internet networks, the questions can be related to vulnerability assessment of the networks to be prepared for power outage or internet blackout. In biological network analysis, the questions are related to how different diseases are related to each other, which can be useful in discovering new symptoms of diseases and producing and developing new medicines. It appears clearly that the reason of the importance of this interdisciplinary area of science, is due to its widespread applications which involves scientists and researchers with a variety of background and interests.
Although networks are much simpler compared to the original complex systems, the interactions among the nodes in the real-world network may seem random, and capturing patterns on these entities is not trivial. There are tremendous questions about inference on networks, which makes this topic very attractive for researchers in the field. In this dissertation we answer some of the questions regarding this topic in two lines of study: one focused on experimental analyses and one focused on theoretical limitations.
In Chapter 2 we look at community detection, a common graph mining task in network inference, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detectionâs No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. We present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithmâs overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection.
In Chapter 3 we investigate link prediction problem, another important inference task in complex networks with a wide variety of applications. As we observed in Chapter 2, the community detection algorithmic differences induce wide variation in accuracy on link prediction tasks. On the other hand, many link prediction techniques exist in literature and still there is lack of methodology to analyze and compare these techniques. In Chapter 3, we provide a methodological overview of link prediction techniques and present new results on optimal link prediction and on transfer learning for link prediction. In the former, we investiga
new caerin-like antibacterial peptide from the venom gland of the Iranian scorpion Mesobuthus eupeus: cDNA amplification and sequence analysis
Scorpion venom consists of different types of peptides and proteins which are encoded by individual genes. A full length cDNA consisting of 238 base pair nucleotides and encoding 74 amino acids peptide was isolated from the venom gland of the Iranian scorpion Mesobuthus eupeus (Buthidae family). This peptide named M. eupeus caerin-like antimicrobial peptide (Me-CLAP) belonging to the group of antibacterial peptide was previously described from scorpion. In this study, sequence of cDNA encoding Me-CLAP from the M. eupeus venom glands was amplified using reverse transcriptase polymerase chain reaction (RT-PCR) and was analyzed afterwards. Me-CLAP has similar molecular characteristics to antimicrobial peptides (AMPs) of same genus like Mesobuthus martensii and M. eupeus and more differences were seen with other genus.Keywords: Caerin-like antimicrobial peptide, Mesobuthus eupeus, semi-nested real-time polymerase chain reaction
Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube
Although it is understudied relative to other social media platforms, YouTube
is arguably the largest and most engaging online media consumption platform in
the world. Recently, YouTube's outsize influence has sparked concerns that its
recommendation algorithm systematically directs users to radical right-wing
content. Here we investigate these concerns with large scale longitudinal data
of individuals' browsing behavior spanning January 2016 through December 2019.
Consistent with previous work, we find that political news content accounts for
a relatively small fraction (11%) of consumption on YouTube, and is dominated
by mainstream and largely centrist sources. However, we also find evidence for
a small but growing "echo chamber" of far-right content consumption. Users in
this community show higher engagement and greater "stickiness" than users who
consume any other category of content. Moreover, YouTube accounts for an
increasing fraction of these users' overall online news consumption. Finally,
while the size, intensity, and growth of this echo chamber present real
concerns, we find no evidence that they are caused by YouTube recommendations.
Rather, consumption of radical content on YouTube appears to reflect broader
patterns of news consumption across the web. Our results emphasize the
importance of measuring consumption directly rather than inferring it from
recommendations.Comment: 29 pages, 21 figures, 15 table
Causally estimating the effect of YouTube's recommender system using counterfactual bots
In recent years, critics of online platforms have raised concerns about the
ability of recommendation algorithms to amplify problematic content, with
potentially radicalizing consequences. However, attempts to evaluate the effect
of recommenders have suffered from a lack of appropriate counterfactuals --
what a user would have viewed in the absence of algorithmic recommendations --
and hence cannot disentangle the effects of the algorithm from a user's
intentions. Here we propose a method that we call "counterfactual bots" to
causally estimate the role of algorithmic recommendations on the consumption of
highly partisan content. By comparing bots that replicate real users'
consumption patterns with "counterfactual" bots that follow rule-based
trajectories, we show that, on average, relying exclusively on the recommender
results in less partisan consumption, where the effect is most pronounced for
heavy partisan consumers. Following a similar method, we also show that if
partisan consumers switch to moderate content, YouTube's sidebar recommender
"forgets" their partisan preference within roughly 30 videos regardless of
their prior history, while homepage recommendations shift more gradually
towards moderate content. Overall, our findings indicate that, at least on
YouTube, individual consumption patterns mostly reflect individual preferences,
where algorithmic recommendations play, if anything, a moderating role
Environmental Impact Assessment of the Industrial Estate Development Plan with the Geographical Information System and Matrix Methods
Background. The purpose of this study is environmental impact assessment of the industrial estate development planning. Methods. This cross-sectional study was conducted in 2010 in Isfahan province, Iran. GIS and matrix methods were applied. Data analysis was done to identify the current situation of the region, zoning vulnerable areas, and scoping the region. Quantitative evaluation was done by using matrix of Wooten and Rau. Results. The net score for impact of industrial units operation on air quality of the project area was (â3). According to the transition of industrial estate pollutants, residential places located in the radius of 2500 meters of the city were expected to be affected more. The net score for impact of construction of industrial units on plant species of the project area was (â2). Environmental protected areas were not affected by the air and soil pollutants because of their distance from industrial estate. Conclusion. Positive effects of project activities outweigh the drawbacks and the sum scores allocated to the project activities on environmental factor was (+37). Totally it does not have detrimental effects on the environment and residential neighborhood. EIA should be considered as an anticipatory, participatory environmental management tool before determining a plan application